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The multi-viewpoint encoder disclosed 
herein comprises a depth estimator (10), a pre- 
dictor (20) connected to the depth estimator 
(10), and a comparator (15) connected to the 
predictor (20). In addition, the multi-viewpoint 
video encoder has an output, preferably includ- 
ing a multiplexer (19) far multiplexing the first 
image, the depth map, the second viewpoint 
vector and the predicted errors into the signal. 
Multi-viewpoint video encoder also includes a 
depth map encodei/compressoT (17). The depth 
map is compressed according to a video com- 
pression standard, preferably compatible with 
the MPEG-2 standard. The multi-viewpoint 
video ennoder further includes a first image en- 
coder (16). The first image is encoded accord- 
ing to a video coding standard. In this manner, 
an MPEG-2 monitor can display the first image 
video without any further modification. 
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Description 
MULTI-VIEWPOINT DIGITAL VIDEO ENCODING 

Field of the Invention 

The present invention related to video decoding 
and encoding apparatus and method and, more particu- 
larly, to a multi -viewpoint digital video coder/ decoder 
and method. 

5 

Background of the Invention 

A multi -viewpoint video is a three-dimensional 
extension of the traditional movie sequence, in that 
multiple perspectives of the same scene exist at any 

10 instance in time. In other words, the multi -viewpoint 
video offers the capability of "looking around" objects 
in a scene- Thus, typical uses may include interactive 
applications, medical surgery technologies, remote 
sensing development, virtual reality games, etc. 

15 With the development of digital video technology, 

a video data compression standard, namely the second 
Motion Picture Experts Group specification (MPEG-2) , 
has been adopted by the International Standards Organi- 
zation (ISO) and the International Telecommunications 

20 Union (IUT) . MPEG-2 is a coding standard specified for 
one video sequence. MPEG-2 has also been recently 
shown to be applicable to two sequences of stereoscopic 
signals through the use of additional vectors. For 
purposes of this application, the relevant parts of 

25 sections 6 and 7 of the ISO document DIS 13 818-2 will 
be hereinafter referred to as the "MPEG-2 standard." 

However, extending the number of viewpoint videos 
beyond two views cannot be done practically by using 
the same methodology as the number of vectors would 

30 grow exponentially. Instead, a multi-viewpoint 
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coder/decoder should compress the digital information 
so that information can be sent using as little 
bandwidth as possible. 

In addition, a multi-viewpoint coder/decoder 
5 should be compatible with prior standards. In other 
words, while a TV may not properly show the different 
viewpoints in the multi -viewpoint video, the TV should 
be able to decode one viewpoint. 

A multi -viewpoint coder/decoder should also be 
10 open-ended. In-this manner, individual coding modules 
can be improved in accordance with any technological 
advances as well as the creativity and inventive 
spirits of software providers. An open-ended scheme 
would also allow a person to adjust the quality of the 
15 multi -viewpoint video according to system requirements 
and variables. Furthermore, such scheme would be 
easily expandable to provide as many video viewpoints 
as desired. 

Finally, a multi -viewpoint coder/decoder should be 
hardware-based, instead of software-based. In this 
manner, fast and efficient coding/decoding can be 
achieved. 



20 



Summary of the Invention 

25 The multi -viewpoint video encoder disclosed herein 

comprises a depth estimator, a predictor connected to 
the depth estimator, and a comparator connected to the 
predictor. In addition, the multi -viewpoint video 
encoder has an output, preferably including a multi - 

30 plexer for multiplexing the first image, the depth map, 
the second viewpoint vector and the prediction errors 
into a signal. 

The multi -viewpoint video encoder also includes a 
depth map encoder/compressor. The depth map is 

35 compressed according to a video compression standard, 
preferably compatible with the MPBG-2 standard. 
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The mult i -viewpoint video encoder further includes 
a first image encoder. The first image is encoded 
according to a video coding standard, preferably 
compatible with the MPEG- 2 standard. In this manner, 
5 an MPEG- 2 monitor can display the first image video 
without any further modifications. 

Many of the elements described above are already 
found in MPEG-2 encoders. Accordingly, a multi- 
viewpoint video encoder only requires the addition of 
10 the depth estimator and the predictor mentioned above. 

To encode multi -viewpoint video, a first image 
having a first viewpoint vector is selected. A depth 
map is formed for this image. A second image having a 
second viewpoint vector is also selected. A predicted 
15 second image having the second viewpoint vector is then 
predicted by manipulating the first image and the depth 
map to reflect the second viewpoint vector. The 
prediction errors required for reconstructing the 
second image from the predicted second image are 
20 calculated by comparing the second image and the 
predicted second image. 

The first image, the depth map, the second 
viewpoint vector and the prediction errors are 
transmitted, preferably they are multiplexed into a 
25 signal. Before transmission, the depth map could be 
compressed according to a video compression standard, 
preferably compatible with the MPEG-2 standard. 
Similarly, the first image should be encoded according 
to a video coding standard, such as the MPEG-2 
30 standard. 

The multi -viewpoint video decoder disclosed herein 
comprises a receiver, a predictor connected to the 
receiver, and a reconstruct or connected to the receiver 
and the predictor. The predictor further includes a 
35 manipulator. In addition, the multi- viewpoint video 
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decoder may include a depth map decompressor connected 
between the receiver and the predictor. 

Many of the elements described above are already 
found in MPEG-2 decoders. Accordingly, a multi- 
5 viewpoint video decoder only requires the addition of 
the predictor, as mentioned above. 

In order to provide video in a desired viewpoint, 
the multi -viewpoint video decoder must include a 
receiver and a predictor connected to the receiver. 
10 This predictor has a manipulator. The multi -viewpoint 
video decoder may also include a depth map decompressor 
connected between the receiver and the predictor. 

In addition, the multi -viewpoint video decoder 
further includes a constructor connected to the 
15 predictor. The constructor also includes a memory. 

As discussed above, many of the elements required 
are already found in MPEG-2 decoders. Accordingly, a 
multi -viewpoint video decoder requires only the 
addition of the predictor mentioned above. The multi - 
20 viewpoint video decoder may also include a constructor 
connected to the predictor. Such decoder should also 
include means for obtaining the desired viewpoint 
vector. 

To decode multi -viewpoint video, a decoder must 
25 receive a first image having a first viewpoint, a depth 
map, a second viewpoint vector and prediction errors. 
A predicted second image having the second viewpoint 
vector is then formed by manipulating the first image 
and the depth map to reflect the second viewpoint 
30 vector. Further, a second image having the second 

viewpoint vector then reconstructed by combining the 
prediction errors and the predicted second image. 

If a viewpoint different from the second viewpoint 
is desired, the following method applies: a decoder 
35 must receive a first image having a first viewpoint, a 
depth map, a second viewpoint vector and prediction 
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errors. A predicted second image having the desired 
viewpoint vector is then formed by manipulating the 
first image- and the depth map to reflect the desired 
viewpoint vector. If possible, a second image having 
the desired viewpoint vector can be constructed by 
combining a first stored mesh, a second stored mesh, a 
first stored image, a second stored image, and the 
predicted second image . The first stored image is a 
nearest past stored image reconstructed by combining 
the prediction errors and the predicted second image. 
The first stored mesh is a stored mesh respective to 
the nearest stored past reconstructed image. 
Similarly, the second stored image is a nearest future 
image reconstructed by combining the prediction errors 
15 and the predicted second image. The second stored mesh 
is a stored mesh respective to the nearest stored 
future reconstructed image. 

Brief Descriptio n of the Drawings 
20 Now the present invention will be described in 

detail by way of exemplary embodiments with reference 

to the accompanying drawings in which t 

FIG. 1 illustrates the viewpoint image arrangement 

referred to throughout the specification; 
25 FI <3- 2 illustrates a block diagram of an 

embodiment of the multi -viewpoint encoder of the 

present invention; 

FIG. 3 is a flow chart illustrating the encoding 

process of the multi -viewpoint encoder of the present 
30 invention; 

FIG. 4 is a "round robin" prediction structure for 
the encoder selection of viewpoints, wherein the 
encoder only selects one viewpoint at a time; 

FIG. 5 is two alternative "round robin" prediction 
35 structures for the encoder selection of viewpoints, 

wherein the encoder selects two viewpoints at a time; 
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FIG. 6 illustrates a block diagram of an embodi- 
ment of the multi -viewpoint decoder of the present 
invention; and 

FIG. 7 is a flow chart illustrating the decoding 
process of the multi -viewpoint decoder of the present 
invention. 

Detailed Description 

FIG. l illustrates the viewpoint image arrange- 
ment, i.e., the positioning of the cameras, to be 
encoded by the multi -viewpoint video encoder of the 
present invention. The images referred to hereinafter 
will correspond to the viewpoint image arrangement. 
Accordingly, I c will have a central viewpoint, I T will 
have a top viewpoint, I B will have a bottom viewpoint, 
I R will have a right viewpoint, and I L will have a left 
viewpoint. 

FIG. 2 schematically illustrates an embodiment of 
the multi -viewpoint video encoder of the present 
invention. The encoder has a depth estimator 10. The 
depth estimator 10 creates a depth map D c < for the 
central image I c l . The central image I c l has a first 
viewpoint vector, namely the central viewpoint vector. 
The depth map D c l is created from the multiple viewpoint 
images, in the manner described below. 

The depth of an object can be geometrically 
calculated if two or more perspectives of the object 
are given. First, the positions of the object in each 
of the available viewpoint images must be located. The 
simplest method is to use the same matching techniques 
used in estimating motion for a temporal sequence of 
images. These techniques include: (1) correlation 
matching, as described in Andreas Kopernik and Danielle 
Pele, "Disparity Estimation for Stereo Conpensated 3DTV 
Coding, B 1993 Picture Coding Symposium, March 1993, 
Lausanne, Switzerland; (2) relaxation matching, as 
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described in D. Marr and T. Poggio, "Cooperative 
Computation of Stereo Disparity," Science , vol. 194, 
pp. 283-287 (1976); and (3) coarse -to -fine matching, as 
described in Dimitrios Tzovaras, Michael G. Strintzis, 
5 and Ioannis Pitas, "Mai ti resolution Block Matching 

Techniques for Motion and Disparity Estimation, tt 1993 
Picture Coding Symposium, March 1993, Lausanne, 
Switzerland. Other algorithms can be found throughout 
the computer vision field of art. 
10 After locating the object, the difference in image 

coordinates is termed disparity. The depth distance of 
the object is inversely proportional to the derived 
disparity. Depth estimation/disparity estimation 
algorithms are widely available in current literatures. 
15 A few classical methods for calculating depth are 

provided in Berthold Klaus and Paul Horn, Robot Vision . 
MIT Press (1986), and Stephen Barnard and Martin 
Fischler, "Computational Stereo," in ACM Confuting 
Surveys, vol. 14, no. 4, Dec. 1982, pp. 553-572. 
20 Another method was described in Shree K. Nayar, 

Masahiro Watanabe, Minor i Nocjuchi, "Real-Time Focus 
Range Sensor, * Fifth International Conference on 
Computer Vision, Cambridge, -Mass . , June 1995. other 
algorithms can be found throughout the computer vision 
25 field of art. 

The matching and disparity algorithms mentioned 
above can be used in the preferred embodiments of the 
invention. The specific algorithm to be used in 
matching and determining disparity, however, depend on 
30 the system capabilities, including processing speed, 
bandwidth capability, desired picture quality, number 
of available viewpoint images, etc. Nevertheless, the 
algorithms should be translated into a hardware 
solution, either hard-wired, logic table-based, etc., 
35 so that the images can be processed at a faster rate 
than with a software solution. 
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The central image l c * is then encoded by the image 
encoder 16 in a format compatible with section 7 of the 
ISO document DIS 13818-2. Such an encoder is described 
in U.S. Patent 5,193,004, issued to Feng Ming Wang and 
Dimitris Anastassiou. By encoding the image 1^ in a 
format compatible with the MPEG-2 specification, any 
MPEG- 2 monitor may be able to decode the information 
and display the image. Such monitor, however, will 
not be able to decode the mult i -viewpoint video unless 
it is equipped with the extra hardware described 
below. Similarly, the depth map Dc 1 is also encoded 
and compressed in a format that is compatible with 
section 7 of the DIS 13818-2 and/or MPEG Test Model 5 
(ISO Doc. ISO-IEC/JTC1/SC29/WG11/NO400) , by the 
encoder/compressor 17. Such an encoder is described in 
U.S. Patent 5,193,004, issued to Peng Ming Wang and 
Dimitris Anastassiou. 

After being encoded, both the image I c * and the 
depth map D c l are decoded by decoder 22 and decoder 23, 
respectively. By using the decoded image I c f and depth 
map D c * (hereinafter image I c * and depth map D c l , respec- 
tively) , the encoder will base its coding on the same 
data the decoder will receive, allowing for better 
results. 

The predictor 20 predicts a predicted second image 
haying a second selected viewpoint vector. The 
predictor 20 contains three essential components. 
First, a matrix manipulator 12 forms a mesh or 3-D 
matrix If by combining the image I c l and the depth map 
D^. For every image point T c % {xc,y c ) there is provided a 
corresponding depth value Zc-JV (xc,y c ) . Accordingly, 
this set of 3D coordinate information {Xc/y c #Zc> *s 
similar to a 3D geometrical model or mesh. In other 
words, by combining the two-dimensional matrix I c f with 
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the corresponding depth values from the depth map D c l , a 
3-D matrix or mesh is created. A corresponding texture 
map incorporating the intensity values for each 
5 coordinate is also kept. This process is further 
explained in James Foley et al.. Computer Graphics 
Principles and Practice . Addison -Wesley Publishing Co. 
(2d ed. 1990) . In addition, hardware -based solutions 
for this manipulator can be found throughout the 
10 computer graphics field. 

In addition, the predictor 20 has a vector 
selector 13. The vector selector 13 selects a vector 
V x '. The vector V x * is selected in a "round robin" 
rotational basis amongst the directional vectors of the 
15 four non-central images of FIG. l, i.e., I L , I B , I R , and 
I T . As shown in FIG. 4, the selected vector/image 
sequence as related to time t would be I B t+l , I R t+ \ 

^t* 3 * ^l* 4 * ^b 1 * 5 * *r 1+6 # • - - f etc. As discussed below, 
FIG. 5 illustrates alternative selected vectors/images 
20 sequences as related to time t if the bandwidth permits 
the encoding of three images. 

Finally, referring again to FIG. 2, the predictor 
20 also includes a combiner 14. The combiner 14 
interpolates the mesh with the selected vector V x \ 
25 In this manner, the resulting predicted image PI x l will 
portray the mesh M* in the viewpoint of vector V x l . This 
process is further explained in James Foley et al.. 
Computer Graphics Princ iples and Practice . Addison- 
Wesley Publishing Co. (2d ed. 1990). in addition, 
30 hardware-based solutions for this combiner can be found 
throughout- the computer graphics field. 

The output of the vector selector 13 is used to 
trigger selector 11. The selector 11 assures that the 
image I x l sent to the comparator 15 will have the same 
35 viewpoint as the selected vector V x l . in other words, 
if the selected vector V x l is the viewpoint vector of 
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ima9e I L l , selector 11 will send image I L ' to the 
comparator 15. 

The comparator 15 then compares the predicted 
image PI X ' with the selected image I x l in order to 
calculate the prediction errors PE* required to 
reconstruct image I x l from predicted image PI x l . The * 
prediction errors PE 1 are calculated by examining the 
differences between the image I x l and the predicted 
image PI X 4 . The comparator 15 calculates prediction 
errors, in the usual manner of MPEG- 2 encoders, i.e., 
compatible with section 7 of the ISO document DIS 
13818-2. The prediction error encoder 18 then encodes 
the prediction errors PE 1 according to the MPEG- 2 
specification. 

15 The encoded central image I c 4 , depth map D c < and 

prediction errors PE 1 are then multiplexed into a 
signal S along with the selected vector V x l by the 
output/multiplexer 19. The MPEG -2 syntax of the 
encoded bitstreams is found in section 6 of the ISO 
20 document DIS 13818-2. Additionally, the encoder may 
also transmit an MPEG- 2 header containing the direc- 
tional information, i.e., the directional vectors, of 
Ic, IJ, I b », Ir 1 , and I T l . 

The comparator 15, the encoders 16, 17 and 18, the 
25 output/multiplexer 19, and the decoders 22 and 23 are 
all found in MPEG- 2 encoder 21. 

FIG. 3 illustrates the flow chart of the method 
for encoding multi -viewpoint video. In Step 101, the 
images I c \ I L \ i B <, X J, ana are inputted intQ tne 

30 multi-viewpoint video encoder of PIG. 2. The central 

image I c < is then encoded and outputted according to the 
MPEG-2 specification (ST 102). in addition, the 
encoded image I c l is decoded for use within the process 

35 (herein image I c l ) . 
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A depth map D c l is then calculated using the 
information in images I c l , 1^, I B l , l R l , and I T l as 
mentioned above (ST 103) . The depth map D c * is also 
encoded and outputted according to the MPEG- 2 specif i- 
5 cation (ST 104) . In addition, the encoded depth map D c ' 
is decoded for use within the process (herein depth 
map D c ') . 

In Step 105, a vector V x l is selected in a 
10 "round robin" rotational basis amongst the directional 
vectors of the four non- central images of FIG. l r i.e., 
I L , I B , I R , and I T . As shown in FIG. 4, the selected 
vector/image sequence as related to time t would be I L l , 

-r t+! T 1+2 T t+3 T t+4 T 1+5 T t+6 

A B ' ■'•R t A T / J-L * * 1r t * * • » StC. 

15 An equivalent step would be to select the images, 
instead of the vectors, on a rotational basis. 

In step 107, a mesh or 3-D matrix by manipulat- 
ing the image T c l and the depth map 6 C * as described 

20 above. A corresponding texture map incorporating the 

intensity values for each coordinate is also kept. The 
mesh *f is then combined, or interpolated, with the 
selected vector V x l (ST 108) . In this manner, the 
resulting predicted image PI x l will portray the mesh M 1 

25 in the viewpoint of vector V x \ 

The predicted image PI x l is compared with the 
selected image I x l in order to calculate the prediction 
errors PE l required to reconstruct image I x * from 
predicted image PI x l (ST 109). The prediction errors PE 1 

30 are calculated by examining the differences between the 
image I X * and the predicted image PI X . 

If bandwidth allows, another vector can be 
selected so that the prediction errors for the new 
viewpoint can be determined (ST ill) . FIG. 5 

35 illustrates two possible selected vectors /images 

sequences as related to time t. Otherwise, the entire 
process starts over (ST 111) . 
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The prediction errors PE l are then encoded and 
outputted According to the MPEG-2 specification 
(ST 110) . Similarly, the selected vector V x l is also 
outputted (ST 106) . 
5 FIG. 6 schematically illustrates an embodiment of 

the multi- viewpoint video decoder of the present 
invention. The multi -viewpoint video decoder has an 
input /demultiplexer 60. The input/demultiplexer 60 
receives a signal S and demultiplexes the information 
10 corresponding to the central image I c \ the depth 

map D C S the selected viewpoint vector and prediction 
errors PE l . 

In addition, the multi-viewpoint video decoder has 
sin. image decoder 61 f a decoder/decompressor 62 and a 

15 prediction error decoder 63 for decoding the central 
image I c l , the depth map D c l , the prediction errors PB 1 , 
respectively. These decoders comply with the MPEG-2 
standard and, more specifically, section 7 of the 
ISO document DIS 13 818-2. In addition, the 

20 input/demultiplexer 60, the image decoder 61, the 
decoder/decompressor 62 and the prediction error 
decoder 63 are part of the MPEG-2 decoder 75. Once 
decoded, the image I c l and the selected viewpoint 

25 vector V x * are stored in memory 69. 

The multi -viewpoint video decoder also has a 
vector input 64. A person can input any desired 
vector Vu 1 to display through any variation of vector 
input 64, including a head tracker, a joystick, a 

30 mouse, a light pen, a trackball, a desk pad, verbal 
commands , etc . 

A predictor 76 contains two essential elements: 
a matrix manipulator 65 and a combiner 66. The matrix 
manipulator 65 forms a mesh or 3-D matrix by 

35 combining the image I c l and the depth map D c l , in the 
manner described above. This resulting mesh *tf is 
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stored in a memory 69. A corresponding texture map 
incorporating the intensity values for each coordinate 
is also kept. The combiner 66 interpolates the mesh M 1 
with the desired vector V v \ In this manner, the 
5 resulting predicted image Pl v l will portray the mesh *f 
in the viewpoint of vector V v l . These processes are 
further explained in James Foley et al . , Computer 
Graphics Princ iples and Practice . Addis on -Wesley 
Publishing Co. (2d ed. 1990) . In addition, hardware- 
10 based solutions for the matrix manipulator *nH the 

combiner can be found throughout the computer graphics 
field. 

A switch 67 is dependent on the relation between 
the desired vector Vu 1 and the selected vector V x \ If 
15 both vectors are equal, the predicted image PTj is then 
combined with the prediction errors PE 1 via the predic- 
tion error combiner 68. (The prediction error combiner 
68 is also part of the MPEG-2 decoder 75.) The result- 
20 ing reconstructed image i x l is then stored in memory 69 
and outputted via the output 72. 

If the desired vector and the selected 
vector V x * are not equal, the constructor 77 is then 
25 used. The constructor 77 has several essential 

elements: the memory 69, the mesh imagers MSI and MS2, 
the warping module 70, and the constructing module 71. 
The nearest past reconstructed image in the desired 
30 viewpoint the mesh respective to the nearest 

past reconstructed image the nearest future 

reconstructed image in the desired viewpoint iu t+B , 
35 and the meSh tf +B respective to the nearest future 

reconstructed image I„ l+B # all stored in memory 69, are 
retrieved upon the input of the desired viewpoint 
vector Vu*. 
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The nearest past reconstructed image i„" and its 
respective mesh Vt f are combined to form a nearest past 
mesh image MI C W by the mesh imager MSI. Similarly, the 
S nearest future reconstructed image and its respec- 

tive mesh M" 8 are combined to form a nearest future 
mesh image MI C ~ B by the mesh imager MS2 . This process 
is further explained in James Poley et al. f Computer 
10 Graphics Principles and Pr a r h<n Q| Addison-Wesley 

Publishing Co. (2d ed. 1990) . in addition, hardware- 
based solutions for this combiner can be found 
throughout the computer graphics field. 

The nearest past mesh image MV and the nearest 
15 future mesh image Ml c ' + » are then warped by the warping 
module 70 to form an intermediate mesh image MPI,/ for 
the time t. Additionally, the warping procedure should 
weigh the desired time t in order to provide a proper 
intermediate mesh image. Accordingly, if the time t 
is closer to time t-f than to time t+B, the warped 
intermediate mesh image will reflect an image closer 
to the image at time t-f rather than at time t+B. 
The warping process is further explained in 
George Woldberg, pjqital Tmaoe » ara <n n IEEE Computer 
25 Society Press (1990) . in addition, hardware -based 
solutions for this warping module can be found 
throughout the computer graphics field. 

This mesh image is then combined with the 
predicted image Pi u « by the constructing module 71. The 
30 combination process is further explained in Y.T. Zhou. 
"Multi-Sensor Image Fusion,- International rnnfp.on.o' 
on Image Processing , Austin, Texas, U.S.A. (1994). The 
constructing module 71 can be as simple as an exclusive 
OR (XOR) logic gate. In addition, other hardware-based 
solutions for this constructing module can be found 
throughout the computer vision/image fusion field. 
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The resulting constructed image Z v x is then 
outputted via the output 72. 

The mesh imaging, warping and construction 
algorithms to be used depend on the system 
capabilities, including processing speed, bandwidth 
capability, desired picture quality, number of 
available viewpoint images, etc. Nevertheless, these 
algorithms should be translated into a hardware 
solution, either hard-wired, logic table-based, etc, 
so that the images can be processed at a faster rate 
than with a software solution. 

FIG. 7 illustrates the flow chart of the method 
for decoding multi -viewpoint video. In Step 201, the 
image I c \ the depth map Dc 1 , the selected viewpoint 
vector V x l , and the prediction errors PE l are inputted 
into the multi -viewpoint video decoder of FIG. 6. 
Similarly, a user- desired vector is selected and 
inputted (ST 202) . 

The image ic 1 and the depth map D c l are combined 
through matrix manipulations to forms a mesh or 3-D 
matrix *f , in the manner described above (ST 203) . A 
corresponding texture map incorporating the intensity 
25 values for each coordinate is also kept. Further, the 
mesh M 1 is interpolated with the desired vector V to 
form predicted image PI^, which portrays the mesh *f in 
the viewpoint of vector V„ l (ST 204). 

Step 205 is dependent on the relation between the 
30 desired vector V„ l and the selected vector V x l . if both 
vectors are equal, the predicted image PI^ is then 
combined with the prediction errors PB l (ST 211) . The 
35 resulting reconstructed image I x « is then stored 

(ST 212) and outputted (ST 213). Then the process 
starts over again. 

However, if the desired vector v v % and the selected 
vector v x » are not equal, the nearest past 
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reconstructed image in the desired viewpoint T v **, the 
mesh ^ respective to the neaxest past reconstructed 
5 image the nearest future reconstructed image in the 

desired viewpoint I u ,+B , and the mesh M t+B respective to 
the nearest future reconstructed image Iu l+B are 
10 retrieved from memory (ST 2 06) . The nearest past 

reconstructed image and its respective mesh M*~ f axe 
combined to form a nearest past mesh image MIu** 
(ST 207) . Similarly, the nearest future reconstructed 
15 image Iu t+B and its respective mesh ** +B are combined to 
form a nearest future mesh image WL V M (ST 207) . The 
nearest past mesh image MI^ and the nearest future mesh 
image MTu t+B are then warped to form an intermediate mesh 
20 image MPIy 1 for the time t (ST 208). Additionally, the 
warping procedure should weigh the desired time t in 
order to provide a proper intermediate mesh image. 
Accordingly, if the time t is closer to time t-f than 
to time t+B, the warped intermediate mesh image will 
25 reflect an image closer to the image at time t-f rather 
than at time t+B. 

This mesh image is then combined with the 
predicted image PTj 1ST 209). The resulting 
constructed image Iu 1 is then outputted (ST 210) . Then 
30 the process starts over again. 

If all the images for each non- central viewpoint 
are desired, i.e., I L l , l B *, i R », i T *, the process described 
above should be repeated for each viewpoint. 

It will be understood that the invention is not 
35 limited to the embodiments described and illustrated 

herein as they have been given only as examples of the 
invention. Without going beyond the scope of the 
invention as defined by the claims, certain arrange- 
ments may be changed or certain components may be 
40 replaced by equivalent components. For example, the 
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depth map D c and the image I c l need not be manipulated 
together to form a mesh M 1 , which is later combined with 
5 a viewpoint vector. Instead, both the depth map D c l and 
the image J c l can each be combined with the viewpoint 
vector and later be reconstructed. Similarly, the 
nearest past and future meshes need not be stored in 
10 memory. Instead, the nearest past and future images 

can be stored in memory and later combined with stored 
depth maps to form the meshes . 
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Claims 

1 1. A raulti -viewpoint video encoder comprising: 

2 a depth estimator; 

3 a predictor connected to the depth estimator; 

4 and 

5 a comparator connected to the predictor. 

1 2. The multi-viewpoint video encoder of claim 1, 

2 further comprising an output. 

1 3. The multi -viewpoint video encoder of claim 2, 

2 wherein the output further comprises a multiplexer 

3 for multiplexing the first image, the depth map, 

4 the second viewpoint vector and the prediction 

5 errors. 



1 4. 



1 
2 



1 7. 
2 

1 8. 

2 

3 



The multi -viewpoint video encoder of claim 1, 



2 further comprising a depth map compressor. 

1 5. The multi -viewpoint video encoder of claim 4, 

2 wherein the depth map is compressed according to a 

3 video compression, standard. 



The multi-viewpoint video encoder of claim 5, 
wherein the video compression standard is 



3 compatible with the MPEG -2 standard. 



The multi -viewpoint video encoder of claim 1, 
further comprising a first image encoder. 

The multi -viewpoint video encoder of claim 7, 
wherein the first image encoder encodes the first 
image according to a video coding standard. 
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1 9. The multi -viewpoint video encoder, of claim 8, 

2 wherein the video coding standard is compatible 

3 with the MPEG -2 standard. 

1 10. A multi -viewpoint video encoder comprising: 

2 an MPEG -2 encoder; 

3 a depth estimator connected to the MPEG- 2 

4 encoder; and 

5 a predictor connected to the depth estimator 

6 and the MPEG- 2 encoder. 

1 11. The multi -viewpoint video encoder of claim 10, 

2 wherein the predictor further comprises a 

3 manipulator, a combiner, and a vector selector. 

1 12. A multi -viewpoint video decoder comprising: 

2 a receiver; 

3 a predictor connected to the receiver; and 

4 a reconstructor connected to the predictor 

5 and the receiver. 

1 13. The multi -viewpoint video decoder of claim 12, 

2 wherein the predictor further comprises a 

3 manipulator and a combiner. 

1 14. The multi- viewpoint video decoder of claim 12, 

2 further comprising a depth map decompressor. 

1 15 . A multi -viewpoint video decoder comprising: 

2 an MPEG -2 decoder; and 

3 a predictor connected to the MPEG- 2 decoder. 

1 16. The multi -viewpoint video decoder of claim 15, 

2 wherein the predictor further comprises a 

3 manipulator and a combiner. 
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1 
2 
3 

1 
2 
3 

1 
2 
3 

1 

2 
3 

1 
2 
3 

1 

2 

1 



1 
2 
3 

1 

2 
3 



17. A multi-viewpoint video decoder comprising: 

a receiver; and 

a predictor connected to the receiver, 

18. The multi-video decoder of claim 17, wherein the 
predictor further comprises a manipulator and a 
combiner. 

19. The multi -viewpoint video decoder of claim 17, 
further comprising a depth map decompressor 
connected between the predictor and the receiver. 

20. The multi-viewpoint video decoder of claim 17, 
further comprising a desired viewpoint vector 



21. The multi -viewpoint video decoder of claim 17, 
further comprising: 

a constructor connected to the predictor. 

22. The multi -viewpoint video decoder of claim 21, the 
constructor further comprising a memory. 

23. A multi -viewpoint video decoder comprising: 

an MPEG -2 decoder; and 

a predictor connected to the MPEG-2 decoder. 

24. The multi-video decoder of claim 23, wherein the 
predictor further comprises a manipulator and a 
combiner . 

25. The multi -viewpoint video decoder of claim 23, 
further comprising a constructor connected to the 
predictor. 
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1 26. The multi-viewpoint video decoder of claim 25, 

2 further comprising a desired viewpoint vector 

3 input . 

1 27. A method for encoding multi -viewpoint video, 

2 comprising the steps of: 

3 selecting a first image having a first 

4 viewpoint vector; 

5 forming a depth map for the first image; 

6 selecting a second image having a second 

7 viewpoint vector; 

8 predicting a predicted second image having 

9 the second viewpoint vector by manipulating the 

10 first image and the depth map to reflect the 

11 second viewpoint vector; and 

12 calculating prediction errors for recon- 

13 structing the second image from the predicted 

14 second image. 

1 28. The method of encoding multi- viewpoint video of 

2 claim 27, further comprising the step of 

3 transmitting the first image, the depth map, the 

4 second viewpoint vector and the prediction errors. 

1 29. The method of encoding multi-viewpoint video of 

2 claim 28, wherein the transmission step comprises 

3 multiplexing the first image, the depth map, the 

4 second viewpoint vector and the prediction errors 

5 into a signal. 

1 30. The method of encoding multi -viewpoint video 

2 of claim 27, wherein the prediction errors 

3 calculation step comprises comparing the second 

4 image and the predicted second image. 
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The method of encoding multi -viewpoint video of 
claim 27, further comprising the step of 
3 compressing the depth map. 



1 31. 
2 



1 32. The method of encoding mult i -viewpoint video of 

2 claim 31, wherein the depth map compression step 

3 is performed according to a video compression 

4 standard. 

1 33. The method of encoding mult i -viewpoint video of 

2 claim 32, wherein the video compression standard 

3 is compatible with the MPEG-2 standard. 

1 34. The method of encoding multi -viewpoint video of 

2 claim 27, further comprising the step of encoding 

3 the first image. 

1 35. The method of encoding multi -viewpoint video of 

2 claim 34, wherein the image encoding step is 

3 performed according to a video coding standard. 

1 36. The method of encoding multi -viewpoint video of 

2 claim 35, wherein the video coding standard is 

3 compatible with the MPEG-2 standard. 

1 37. A method for decoding multi -viewpoint video, 

2 comprising the steps of: 

3 receiving a first image having a first 

4 viewpoint, a depth map, a second viewpoint vector 

5 and prediction errors? 

6 forming a predicted second image having the 

7 second viewpoint vector by manipulating the first 

8 image and the depth map to reflect the second 

9 viewpoint vector; and 
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10 reconstructing a second image having the 

11 second viewpoint vector by combining the 

12 prediction errors and the predicted second image. 

1 38. A method for decoding mult i -viewpoint video, 

2 comprising the steps of: 

3 receiving a first image having a first 

4 viewpoint vector, a depth map, a second viewpoint 

5 vector and prediction errors; and 

6 constructing a predicted second image having 

7 a desired viewpoint vector by manipulating the 

8 first image and the depth map to reflect the 

9 desired viewpoint vector. 

1 39. The method of decoding multi -viewpoint video of 

2 claim 38, further comprising the step of obtaining 

3 the desired viewpoint vector. 

1 40. The method of decoding multi -viewpoint video of 

2 claim 38, further comprising the step of 

3 decompressing the depth map. 

1 41. The method of decoding multi -viewpoint video of 

2 claim 38, further comprising the steps of: 

3 constructing a second image having the 



desired viewpoint vector by combining a first 
stored mesh, a second stored mesh, a first stored 
reconstructed image, a second stored reconstructed 
image, and the predicted second image. 

42. The method of decoding multi -viewpoint video of 
claim 41, wherein the first stored reconstructed 
image is a nearest past stored reconstructed image 
having the desired viewpoint vector. 
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43. The method of decoding multi -viewpoint video of 
claim 42, wherein the first stored mesh is a 
stored mesh respective to the nearest past stored 
reconstructed image. 

44. The method of decoding multi -viewpoint video of 
claim 41, wherein the second stored reconstructed 
image is a nearest stored future reconstructed 
image having the desired viewpoint vector. 

45. The method of decoding multi -viewpoint video of 
claim 44, wherein the second stored mesh is a 
stored mesh respective to the nearest stored 
future reconstructed image. 

46. The method of decoding multi -viewpoint video of 
claim 41, wherein the second image construction 
step further comprises the steps of: 

combining the first stored mesh and the first 
stored reconstructed image to form a first mesh 
image; 

combining the second stored mesh and the 
second stored reconstructed image to form a second 
mesh image ; 

warping the first and second mesh images to 
form an intermediate mesh image; and 

constructing the second image by combining 
the intermediate mesh image with the predicted 
second image . 
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